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ABSTRACT 


Our  research  is  mainly  motivated  first  by  our  desire  to  develop  a  theoretical  foundation  for  hier¬ 
archical  dynamic  game  framework  and  secondly  by  the  desire  to  demonstrate  the  applicability  of 
such  theory  in  the  area  of  for  sensor  resource  management  of  large  sensor  networks  of  highly 
decentralized,  distributed  nature  for  the  purpose  of  multi-level  data  fusion.  We  briefly  describe 
the  historical  context  of  game  theory  over  the  past  80  years  out  of  which  our  research  ideas  were 
engendered.  We  then  present  three  new  enabling  approaches  to  game  theory  that  makes  it  more 
applicable  than  past  applications  of  2-person  static  game,  and  describe  them  in  detail,  including 
results  of  simulations. 
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1.  INTRODUCTION 


1.1.  BRIEF  HISTORY 

Game  theory,  a  mathematical  study  of  strategies  in  games,  has  a  long  and  rich  history  dating  all 
back  to  early  1700’s,  and  found  a  firm  mathematical  footing  in  1928  when  von  Neumann  pub¬ 
lished  his  famous  paper.  In  his  work.  Von  Neumann  showed  that  every  two-person  zero-sum 
game  has  a  maxi-min  solution  in  either  pure  or  mixed  strategies.  In  other  words,  in  games  in 
which  one  player's  winnings  equal  the  other  player's  losses,  Neumann  and  Morgenstem  showed 
that  it  is  rational  for  each  player  to  choose  the  strategy  that  maximizes  his  minimum  payoff,  giv¬ 
ing  rise  to  a  notion  of  equilibrium  solution,  i.e.  a  pair  of  strategies,  one  for  each  player,  with 
which  each  player  can  be  most  satisfied  with,  given  that  the  other  play  does  not  alter  his  strategy. 


Figure  1:  Von  Neumann,  Nash,  Aumann,  &  Harsanyi  -  pioneers  of  2-person,  N-person,  Dynamic  and  Uncertain  Games 


This  seminal  work  brought  to  the  field  of  game  theory  quite  an  excitement  and  even  daring  hope 
among  some  mathematicians  that  it  would  do  to  the  field  of  economics  what  Newtonian  calculus 
did  for  physics.  Most  importantly,  it  ushered  in  to  the  field  two  bright  young  researchers  who 
would  significantly  extend  the  outer  limits  of  game  theory.  A  few  such  young  minds  particularly 
stand  out.  One  is  John  F.  Nash,  a  Princeton  mathematician  who,  in  his  1951  Ph.D.  thesis,  [give 
reference  here]  extended  Neumann’s  theory  to  N-person  non-cooperative  games  and  established 
the  notion  of  what  is  now  famously  known  as  Nash-equilibrium  that  eventually  brought  him  the 
Nobel  recognition  in  1994.  Another  is  Robert  Aumann,  who  took  a  departure  from  this  fast  de¬ 
veloping  field  of  finite  static  games  in  the  tradition  of  Neumann  and  Nash,  and  instead  studied 
the  dynamic  version  of  games,  where  the  strategies  of  players  and  sometimes  games  themselves 
change  along  a  time-parameter  t  according  to  some  set  of  dynamic  equations  that  govern  such  a 
change.  Aumann  contributed  much  to  what  we  know  as  theory  of  dynamic  games  today  and  his 
work  eventually  also  garnered  him  a  Nobel  Prize  in  2002.  Rufus  Isaacs,  who  took  a  departure 
from  this  fast  developing  field  of  finite  discrete-time  games  in  the  tradition  of  Neumann,  Nash, 
and  Aumann,  instead  studied  the  continuous-time  version  of  games,  where  the  way  in  which 
strategies  of  players  determine  the  state  (or  trajectories)  of  players  depends  continuously  on 
time-parameter  t  according  to  some  partial  differential  equation.  Isaacs,  who  worked  as  engineer 
on  warplane  propellers  during  World  War  II,  joined  the  mathematics  department  of  RAND  cor¬ 
poration,  a  cold  war  think  tank  after  the  war,  and  there  he  passionately  developed  the  theory  of 
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such  game-theoretic  differential  equations,  a  theory  now  called  differential  games,  with  the  topic 
of  warfare  strategy  as  the  foremost  application  in  his  mind.  Around  the  same  time,  John  Har- 
sanyi  made  a  key  recognition  of  the  imperfectness  and  incompleteness  of  information  that  is  in¬ 
herent  in  many  practical  games,  and  thus  started  a  theory  of  uncertain  games  (often  also  known 
as  games  with  incomplete  information)  and  was  also  awarded  a  Nobel  prize  in  1994  along  with 
John  F.  Nash. 

1.2.  CURRENT  RELEVANCE  AND  NEED  FOR  NEW  APPROACHES 


Despite  the  preponderance  of  their  military  applications,  however,  the  two-person  games,  re¬ 
search  of  which  flourished  under  the  clouds  of  the  two  super-power-cold-war,  have  limitations  in 
this  post  cold  war,  post  September  1 1  era.  One  obvious  limitation  is  the  apparent  lack  of  an  effi¬ 
cient  2-person  game-solver.  Of  course,  Neumann  and  Nash  showed  that  such  a  solution  exists 
and  later  in  1 980’ s,  it  was  shown  that  every  finite  2-person  game  can  be  turned  into  a  linear  pro¬ 
gramming  problem,  which  then  can  be  solved  using  a  number  of  different  techniques,  say,  by  a 
simplex  method  for  example.  However,  such  a  translation  is  not  only  computationally  cumber¬ 
some,  but  furthermore,  some  structure  of  the  game,  such  as  symmetry,  asymmetry,  dominance, 
etc.,  and  thus  some  potential  insight  into  the  game  are  often  lost  during  this  translation,  making  it 
desirable  for  an  efficient  2-person  game-solver  that  preserves  and  even  exploits  structures  of  the 
game  in  question.  Our  research  is  in  part  motivated  by  our  desire  to  look  for  just  such  an  effi¬ 
cient  2-person  game  solver. 

Moreover,  game  theory  research  applied  to  defense-related  problems  has  heretofore  mostly  fo¬ 
cused  on  static  games.  This  was  consistent  with  the  traditional  belief  that  our  adversary  has  a 
pre-defined  set  of  military  strategies  that  he  has  perfected  over  many  years  (especially  during 
the  cold-war)  as  well  as  relatively  short  time  of  engagement  during  which  one  fixed  adversarial 
strategy  is  committed.  However  in  this  post  September- 1 1  era,  there  is  an  increasing  awareness 
that  the  adversary  is  constantly  changing  his  attack  strategies,  and  such  variability  of  adversarial 
strategies  and  even  of  the  game  from  which  these  strategies  are  derived  call  for  application  of 
dynamic  games  to  address  the  current.  However,  although  solutions  of  any  dynamic  2-person 
game  are  known  to  exist  by  the  Folk  theorem,  apparent  lack  of  an  on-line  technique  to  solver  dy¬ 
namic  games  in  an  iterative  fashion  has  stymied  such  an  application.  In  our  research,  we  present 
just  such  an  on-line  iterative  method  to  solve  dynamic  games  using  insights  from  Kalman  filter¬ 
ing  techniques. 

Furthermore,  while  the  interaction  between  offense  and  defense  can  be  effectively  modeled  as  a 
2-person  game,  the  resource  allocation  problem  among  different  assets  of  a  sensor  system  can 
further  be  modeled  as  an  N-person  game.  Thanks  to  Nash’s  seminal  work  in  this  area,  we  know 
an  equilibrium  solution  exists  for  such  N-person  games,  though  actually  finding  an  equilibrium  is 
a  highly  non-trivial  task  (we  recall  Nash’s  proof  of  his  theorem  is  a  non-constructive  theorem). 
There  do  exist  a  number  of  heuristic  methods  to  estimating  and  finding  the  equilibrium  solutions 
of  N-person  games,  but  in  this  research,  we  propose  a  new  method  of  solving  N-person  games 
inspired  by  the  Brouwer’s  fixed  point  theorem  (a  crucial  ingredient  in  Nash's  doctoral  thesis), 
and  show  how  such  methods  can  be  employed  to  optimally  manage  assets  of  a  sensor  system  in 
competition,  coordination  and  cooperation  against  a  common  adversary \ 
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Finally,  we  present  a  sensor  network  simulation  environment  with  sensor  resource  management 
(SRM)  algorithms  that  brings  the  three  aforementioned  ideas  together  in  a  hierarchical  and  dy¬ 
namic  way,  which  not  only  validates  our  ideas  but  also  can  serve  as  a  predictive  tool  of  what  the 
adversary  may  do  in  future  and  what  the  corresponding  defensive  strategy  should  be.  Our  ap¬ 
proach  is  based  on  realization  that  there  is  a  natural  hierarchical  breakdown  between  a  2-person 
game  that  models  the  interaction  between  a  sensor  network  and  its  common  adversary;  and  an  N- 
person  game  that  models  the  competition,  cooperation,  and  coordination  between  asset  of  the 
sensor  network.  We  have  developed  a  concept  of  operations  as  well  as  SW  prototypes  to  simu¬ 
late  such  an  environment  to  corroborate  our  theory. 

The  ultimate  goal  of  developing  these  approaches  is  that  they  would  be  used  for  improving  the 
importance  of  Space  Situational  Awareness  (SSA),  whose  importance  cannot  be 
overemphasized.  In  fact,  according  to  2005  Quadrennial  Defense  Review  (QDR),  the  United 
States  must  have  “unfettered,  reliable,  and  secure”  access  to  its  space  assets,  assured,  for  now,  by 
“improving  space  situational  awareness  and  protection,  and  through  other  space  control 
measures.”  [1]  This  echoed  the  findings  of  the  earlier  2001  version,  which  strongly  insisted  that 
“the  US  must  not  only  exploit  the  advantages  of  the  “high  ground”  of  space,  but  that  it  also 
should  develop  a  robust  means  to  deny  the  use  of  space  assets  to  any  adversary.”  We  strongly 
believe  one  rigorous  theoretical  foundation  to  improve  SSA  is  by  means  of  game  theory,  and  it  is 
our  ardent  hope  that  the  new  approaches  that  were  developed  through  this  research  will  make  a 
positive  contribution  in  this  regard. 
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2.  THEORY 


2.1.  CONOPS,  GAMES,  OVERALL  APPRAOCH 


Advanced  optimization-based  algorithms  for  Sensor  Resource  Management  (SRM)  have  been  a 
research  focus  area  for  multi-sensor  ISR  systems.  However,  these  algorithms  usually  offer  the 
potential  for  automating  the  sensor  control  process  in  response  to  level  1  sensor  data  fusion  (ob¬ 
ject  or  track-level)  estimates,  even  though  studies  have  indicated  that  such  SRM  algorithms  may 
have  limited  value  because  the  algorithms  are  optimized  for  track  maintenance  without  any  as¬ 
sessment  of  overall  situation  context.  To  overcome  such  inadequacies,  we  had  begun  developing 
a  framework  for  representing  the  expected  information  value  of  planned  sensor  measurements  as 
it  contributes  to  higher-level  situational  inferences,  in  our  previous  SBIR  Phase  II  effort  called 
RESE.  In  this  effort,  we  developed  a  hierarchical  valuation  model  that  estimates  target  value  on 
the  basis  of  Level  2  fusion  information  (group  identity)  as  well  as  Level  1  information,  and 
showed  that  the  optimization  algorithms  that  maximizes  this  newly  augmented  value  function 
gave  better  results  not  only  for  classifying  group  identity  but  also  for  track  qualities  of  targets  on 
the  ground. 

One  of  the  most  important  purposes  for  multi-sensor  ISR  system,  however,  is  to  infer  the 
adversarial  intent  given  the  current  estimate  of  the  adversarial  observables,  which  is  the  focus  of 
level-3  information  fusion.  This  is  a  hard  problem,  especially  given  that  such  intent  changes  over 
time  and  depends  on  the  enemy’s  perception  of  the  blue  force’s  intent.  However,  as  we  briefly 
saw  at  the  end  of  our  RESE  effort,  it  is  possible  to  use  one  mathematical  framework,  namely 
game  theory,  to  make  such  reasoning  rigorous  and  effect  a  high-level  data  fusion,  which  utilizes 
all  level-1,  level-2  and  level-3  information. 

Buoyed  by  this  encouraging  yet  ad-hoc  limited  result  near  the  end  of  RESE  effort,  we  decided  to 
find  a  theoretical  foundation  upon  which  we  would  be  able  to  explain  these  results,  which  led  us 
to  the  discovery  of  Kalman-filtering  techniques  for  2-person  dynamic  game  through  this  current 
research.  Furthermore,  aware  of  the  current  and  future  operational  need  to  manage  heterogeneous 
set  of  multiple  sensors  in  a  highly  dynamic,  distributed,  and  decentralized  environment,  we  again 
looked  to  game  theory  (this  time  N-person  game  theory),  treating  the  sensors  in  the  network, 
each  endowed  with  different  capabilities,  modalities,  and  communication  bandwidth,  as  a  player 
in  a  N-person  game  (N-person  coalitional  game,  especially  if  N  becomes  large).  This  has  the  ad¬ 
vantage  of  providing  common  framework  where  the  needs  and  capabilities  of  different  sensors 
can  be  compared  and  coordinated,  allowing  multiple  sensors  to  work  together  through  binding 
agreement  to  achieve  the  most  optimal  cooperative  resource  allocation. 

Figure  2  shows  how  our  overall  approach  will  work.  To  begin  with,  as  two-person  game  is 
defined  by  a  set  of  strategies  for  each  player  and  the  payoff  matrix  which  assigns  a  value  that 
each  player  perceives  for  himself  given  an  ensemble  of  strategies  that  each  player  has  made,  we 
will  begin  by  defining  an  overall  adversarial  space  interaction  between  blue  and  red  in  order  to 
turn  this  into  a  2-person  game,  where  blue  is  our  sensor  network  and  red  is  the  network  of 
adversarial  entities.  Though  our  setup  is  general  enough  for  either  zero-sum  game  or  bi-matrix 
game,  we  have  chosen  a  zero  sum  game  for  the  sake  of  simplicity.  Then,  by  solving  such  a  game 
for  the  adversarial  strategy  as  well  as  the  blue  strategy,  we  attempted  to  infer  as  well  adversarial 
intent  as  well  as  how  the  sensor  network  should  respond  to  such  an  intent. 
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Tactical  Level 
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S2:  Fly  East 


Static 
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Strategy  to 
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Payoff  Matrix 

Objective 

Translator 

Generator 

Figure  2:  Hierarchical  Dynamic-Game  Theoretic  Approach 


Inevitably,  such  definition  of  2-person  games  will  depend  on  many  factors,  which  come  with 
unavoidable  variability  (environmental  as  well  as  human),  and  thus,  we  will  use  the  theory  of 
uncertain  game  (otherwise  known  as  theory  of  games  with  incomplete  information)  throughout 
our  effort.  This  means  that  the  games  we  consider  actually  become  a  probability  distribution  over 
the  space  of  possible  games,  allowing  us  to  experiment  over  a  number  of  different  games  that 
model  the  interaction  between  the  offense  and  defense,  one  by  one  or  even  all  at  once. 

We  will  then  use  the  innovative  approaches  mentioned  in  the  previous  and  expounded  further  in 
the  following  section  to  build  our  hierarchical  dynamic  game-theoretic  SRM  approach,  which  is 
depicted  schematically  in  Figure  3.  As  we  described  above,  the  2-pcrson  game  that  models  the 
interaction  between  the  offensive  entity  and  defensive  sensor  network,  which  results  in  the 
common  strategy  that  the  sensor  assets  need  to  achieve  together,  in  response  to  the  strategy  that 
the  adversary  will  likely  adopt.  Such  a  common  strategy  is  then  passed  onto  different  sensor 
assets  at  which  point  Common  Asset  Coordinator  (CAC)  will  then  solve  an  N-person  game  in  a 
competitive,  coordinated,  or  a  cooperative  fashion  for  an  equilibrium  solution  which  will  give 
rise  to  a  strategy  that  each  asset  will  need  to  adopt,  in  order  to  ensure  optimal  resource  allocation 
among  all  the  assets.  Finally,  each  sensor  asset,  coupled  with  its  GMTI  tracker,  forms  a  TSP 
(Tracker  Sensor  Pair)  and  locally  employs  a  2-person  dynamic  game  of  its  own  in  order  to  make 
its  most  optimal  local  decision,  which  then  makes  a  decision  about  whether  to  choose  this  local 
decision  or  the  decision  produced  from  the  N-person  game  or  a  combination  of  both.  We  now 
describe  this  in  more  detail  in  the  next  section,  and  we  begin  by  reviewing  the  result  of  the  RESE 
effort. 
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Figure  3:  Interplay  between  2-person  game  and  N-person  Game 


2.2.  DEVELOPMENT  OF  EFFICIENT  TWO-PERSON  GAME  SOLVER 


We  have  developed  a  prototype  software  tool,  GameSolver,  for  solving  mathematical  games  that 
are  relevant  for  this  research  effort.  Of  course,  the  most  crucial  aspect  of  using  game  theory  is 
the  determination  of  equilibrium  solutions  (i.e.  Nash  Equilibrium  solution).  This  is  usually  a 
difficult  problem,  but  for  a  two-  person,  zero  sum  game,  i.e.,  blue’s  gain  equals  red’s  losses  and 
vice  versa,  there  arc  known  solution  methods  for  specific  payoff  matrix  structures  (combinatorial 
solution  methods  exist  also  for  2-person  non-zero  sum  games,  but  we  concentrated  our  efforts  to 
zero-sum  games  for  the  sake  of  simplicity).  More  specifically,  these  solution  methods  are 
adaptable  to  the  form  of  the  payoff  matrix: 

•  2x2  matrix  -  This  is  the  simplest  case.  For  example,  the  enemy  has  two  strategies:  wait 

or  attack,  and  the  blue  force  has  two  strategies:  wait  or  defend.  For  a  payoff  matrix  of 

-  b) 

\c  ^ ) ,  the  solution  can  be  written  explicitly: 


P\  “ 


qx  - 


d-b_ 

a  +  d - b-c ’ 
dj-c 

a+d -b-c' 


P2  =1“Pi 
<h  "1-9. 


where  (pi,  P2)  represents  the  blue  force  mixed  strategy  solution,  i.e.,  the  blue  force  should  adopt 
strategy  1  and  strategy  2  with  probability  pi  and  P2,  respectively;  similarly  for  the  red  force  and 
(qi<  qz). 
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•  Severely  Asymmetric  matrix  (m  x  2  or  2  x  n)  -  This  describes  when  one  side  has  a  much 
richer  set  of  strategies  than  the  other.  This  often  describes  situations  in  unconventional, 
asymmetric  warfare  in  this  age  of  GWOT,  where  insurgents  and  terrorist  are  known  to 
have  a  wide  variety  of  strategies  against  more  conventional  blue  force  strategies.  These 
types  of  games  can  be  readily  solved  with  graphical  analysis  on  the  set  of  linear  functions 
of  two  variables  (i.c.  lines). 

•  Square  payoff  matrix  (m  x  m)  -  The  offense  and  defense  have  an  equal  number  of 
strategies.  However,  unlike  the  2  x  2  case,  there  is  no  explicit  solution  formula  for  blue 
and  red  strategies.  Instead,  the  well-known  principle  of  indifference  can  be  used  to  set  up 
a  linear  system  of  m  equations  with  m  unknowns. 

•  Non-square  payoff  matrix  (m  x  n)  -  The  most  general  case  where  one’s  number  of 
strategies  is  not  limited  by  one’s  opponents.  In  this  case,  we  follow  the  clever  trick  first 
realized  by  G.  Owen  in  1982  to  turn  this  problem  into  a  linear  programming  problem,  and 
then  apply  the  well-known  simplex  method. 

•  Invariant  games  -  The  payoff  between  blue  and  red  do  not  change  under  some  1-1 
mapping  of  blue  and  red  strategies  onto  themselves.  In  this  case,  the  Nash  solution  is 
also  invariant,  thus  making  the  solution  easier  to  find.  We  should  first  reduce  the  game 
by  the  mapping. 


Figure  4  summarizes  our  approach: 


Figure  4:  Case  by  case  analysis  for  Efficiently  solving  2-person  games 


2.3.  FILTERING  TECHNIQUES  FOR  SOLVING  DYNAMIC  GAMES 
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As  much  as  the  two-person  game  theory  aptly  describes  the  adversarial  situation  that  must  be 
considered,  in  particular,  for  space  situation  awareness,  and  as  important  as  Neumann’s  and 
Nash’s  equilibrium  results  of  two-person  games  are,  they  are  fundamentally  about  static  games. 
Therefore,  they  do  not  easily  give  insights  to  a  dynamic  situation  where  the  game  is  repeated 
many  or  even  infinite  number  of  times  (repeated  games),  where  the  memory  of  players  add 
dynamics  to  the  games,  or  where  the  rules  of  the  games  themselves  change  (fully  dynamic 
games). 

There  are  theorems,  such  as  Folk  Theorem  [2],  which  proves  that  an  infinitely  repeated  game 
should  permit  the  players  to  design  equilibriums,  supported  by  threats,  with  outcomes  being 
Pareto  efficient.  Also,  at  the  limit  case  of  continuous  time  case,  instead  of  discrete  time,  we  may 
bring  to  bear  some  techniques  of  differential  games,  which  are  inspired  mostly  by  well-known 
techniques  of  partial  differential  equations  applied  to  Hamilton-Jacobi-Bellman  equation,  as 
pioneered  by  Rufus  Issacs  [3],  However,  unfortunately,  these  approaches  do  not  readily  give 
insights  to  how  to  select  and  adapt  strategies  as  the  game  changes  from  one  time  epoch  to  the 
next,  as  is  necessary  in  order  to  gain  space  situational  awareness  in  all  of  its  dynamics. 


Figure  5:  Pictorial  View  of  Filtering  Techniques  for  Dynamic  Games 


Therefore,  based  on  our  experience  in  Kalman  filtering  techniques  for  estimating  states  of 
moving  targets,  we  propose  a  different  approach  to  solve  dynamic  games  in  order  to  help  figure 
out  the  intent  of  an  adversary  to  gain  superior  space  situational  awareness.  The  key  observation 
in  our  approach  is,  ironically  speaking,  not  to  be  overly  concerned  about  Nash  equilibrium.  We 
recall  that  Nash  equilibrium  is  useful  when  a  player  does  not  know  what  the  other  player  may  do, 
and  thus  it  is  some  sense  a  safe  strategy  that  each  player  in  a  game  can  adopt.  However,  when  a 
sensor  network  is  employed  to  gain  awareness  of  a  situation  in  space,  each  time  a  measurement 
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is  made  by  a  sensor  (kinematic  state,  classification  state,  etc.,  depending  on  the  sensor  modality), 
a  measure  of  understanding  about  the  adversary  is  gained,  however  incremental  it  may  be.  Our 
approach  exploits  this  incremental  gain  each  time  a  sensor  in  a  sensor  network  is  employed  in 
order  to  solve  the  dynamic  game  in  question  and  gain  the  best  estimate  of  the  intent  the 
adversarial  space  entities,  which  may  still  continue  to  change. 

Our  approach  is  inspired  by  our  experience  with  Kalman  filtering,  as  described  pictorially  in 
Figure  5,  where  y-axis  represents  adversarial  strategy  and  jc-axis  represents  time.  As  in  Kalman 
filtering  paradigm,  our  approach  tries  to  find  the  right  balance  in  combining  the  prediction  of 
what  the  adversarial  intent  may  next  be,  governed  by  a  model  of  adversarial  strategy  transition 

(operator  1  in  the  figure)  and  the  most  recent  measurement  of  the  adversarial  intent,  given  by 
another  model  of  how  an  analyst  may  process  the  sensor  data  to  map  the  sensor  measurement  to 

a  given  set  of  adversarial  strategies  (operator  A*  1  in  the  figure).  However,  unlike  Kalman 
filtering,  a  third  factor  is  also  balanced  with  the  prediction  of  adversarial  strategy  and  the 
measurement  of  adversarial  strategy,  namely  Nash  equilibrium  strategy  for  the  adversary  (the 
strategy  that  the  adversary  will  most  likely  adopt  without  any  further  insight  into  friendly  force’s 
intent). 

Mathematically,  this  insight  translates  into  adding  a  third-term  to  the  famous  Kalman  filtering 
equations.  Let  us  start  from  the  original  Kalman  filtering  equations  as  below: 

$  -  y',-1  +  K,(x,  -  By1,'1) 
p;  =(I-K,B)P;-{ 

k,  =  p;-1bt{r  +  bp;-xbt)-x 

where 

A/ 

y,  ,  yt  is  state  estimate  of  a  target  at  time  instance  t,  given  observation  up  to  time  t,  t-1 
Xt  is  the  observation  at  time  instance  t 

P,  ,  P,  is  the  covariance  at  time  instance  t,  given  the  measurement  up  to  time  t ,  t-1 

Kt  is  the  Kalman  gain  at  time  instance  t 
B  is  the  measurement  operator 

Starting  from  these  equations,  we  derive  the  following  set  of  equations,  described  in  the  picture 
form  below  in  Figure  6: 
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Kalman  Filter  Equations 
y\=y!r+Kt{xt-By^) 

p;  =(i-k,b)p;-1 

K,  =  Pl,-lBT(R  +  BP;-lBTy 


s't  -  +  K'(A's,  -  A’-14>'-,s'r:i)  + 

c((P/-1)N(G,) 

Pf-(I-K,A)PT' 

k,  -  p;~\A)t(r + Ap;~\A)Tyl 

s]  -  estimated  strategy  at  time  t 
s,  =  true  strategy  at  time  t 
€>'  e  HOM  (2' ,  2'+1 ; %  models  strategy  transition 
2'  =  set  of  strategies  at  time  t 

A!  e HOM (2', 2'; 9t), models  IA' s  understand ing  of  enemy  strategy 
G ,  =  Game  at  time  t 

N(Gt )  =  a  Nash  equil.  solution  for  the  game  G,  at  time  t 
c,  =  a  Nash  equil.  solution  discount  factor  for  the  game  G,  at  time  t 
and  measures  how  much  analyst' s  reasoning  should  be  trusted 


Figure  6:  Mathematical  View  of  Filtering  Techniques  for  Dynamic  Games 


The  first  question  in  Figure  6: 

j>;  =  +  K,{ As,  -  A +  C,(p;-1)N(G,) 

shows  how  S't ,  next  estimate  of  the  adversarial  intent  given  by  its  next  strategy,  is  a  combination 
of  three  terms,  namely: 

■  :  prediction  of  the  next  adversarial  strategy  given  strategy  transition  model  <!>' 

■  K,(As,  -A'-’O'-'f,'-1):  measurement  of  the  current  adversarial  strategy  given  IA 
(Intelligence  Analyst’s  model  of  process  sensor  measurement  data 

■  )N(Gt) :  Nash  equilibrium  tempered  by  a  discount  factor 


Furthermore,  the  next  two  equations  describe  how  uncertainty  of  adversarial  strategy  (given  by 
the  covariance  term  P'  )  and  Kalman  gain  ( Kt  )  grow  under  this  dynamic  system.  To  gain 

further  understanding  of  this  dynamic  system,  we  plan  to  ask  the  following  questions  (&  more  as 
we  continue  the  research) : 

1)  Does  S\  converge  to  true  strategy? 

2)  If  the  answer  to  1)  is  yes,  how  fast  does  it  converge? 

3)  How  stable  is  this  dynamic  system? 

4)  What  is  a  natural  non-linear  extension  of  this  system?  (e.g.  what  is  its  analogue  to  scented 
Kalman  filter)? 
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Though  convergence  and  stability  of  these  new  dynamic  equations  above  still  need  to  be  proved, 
through  a  prototypical  simulation  environment  that  we  have  used  in  this  research,  we  have  a 
preliminary  empirical  verification  the  effectiveness  of  these  techniques  though  an  extensive 
Monte-Carlos  runs,  as  we  will  see  in  later  sections.  We  plan  to  focus  our  future  research  on  these 
questions  we  raised  above,  and  in  so  doing,  we  solidify  the  theory  of  our  fdtering  techniques  for 
dynamic  games  and  present  a  practical  and  computationally  feasible  way  to  understand 
adversarial  interactions  for  future  applications. 

2.4.  BROWDER’S  FIXED  POINT  APPORACH  FOR  N-PERSON  GAMES 

Through  this  research  effort, 
we  also  developed  innovative 
approaches  to  solve  N-person 
game,  for  which  there  is  no 
known  general  approach,  as 
Nash’s  famous  result  on 
equilibrium  was  an  existence 
proof  only,  not  a  constructive 
proof.  The  key  idea  for  N- 
person  game  is  not  to  resort  to 
some  heuristic  approach  in 

order  to  solve  N-person  games.  Figure  7:  Fixed  Point  Theorem  for  BA2 

as  it  is  usually  done  for  static  N- 

person  game,  but  rather  to  use  the  techniques  inspired  by  Browder’s  Fixed  Point  theorem  (Figure 
7)  used  in  Nash’s  original  Ph.D.  thesis.  For  example,  Nash’s  arguments  rest  upon  the  following 

equations  [5]  that  describe  a  transformation  from  one  strategy  £  =  ( Sl,S2,Si,...,Sn )  to  another 
=  (.S1,,S2’‘s3’"-)‘Oby  the  following  mapping: 

S,+'2<P,a(i  K, 

a 

where 

<Pia  -  max(0, pia<£)~  P,(£ )) 

£  is  an  n-tuple  of  mixed  strategies 

Pi(£)  is  the  corresponding  strategy  to  player  / 

pia  ( £  )  is  the  payoff  to  player  /  if  he  changes  to  (Xth  pure  strategy 

We  note  the  crux  of  Nash’s  proof  of  the  existence  of  equilibrium  solution(s)  boils  down  to  this 
fixed  theorem  (see  a  pictorial  explanation  description  of  the  theorem  above)  applied  to  the 

following  mapping  T  —■ *  £ '  and  by  investigating  the  delicate  arguments  that  Nash  used  to 
convert  these  fixed  points  into  his  famous  Nash  equilibrium  solutions,  we  also  plan  to  use  these 


Pictorial  view  of  Browdets  fixed  point  theorem 

Thm  (Browder):  A  continuous 

A  A 

function  from  a  ball  (of  any 

\ 

dimension)  to  it  self  must  leave 
at  least  one  point  fixed. 

Figure  7:  Fixed  Point  Theorem  for  BA2 
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arguments  to  find  a  constructive  methods  to  find  the  solutions  of  N-person  games.  One  possible 
approach  we  have  in  mind  is  to  first  start  from  a  set  of  sample  points  in  the  space  of  strategies, 
and  compute: 


C(£)  measures  the  degree  to  which  a  strategy  £  is  changed  by  the  map  T  .  We  will 

look  for  a  strategy  £  where  C(£)  is  close  to  1,  as  the  possible  initial  search  points  to  look  for 
equilibrium  points  to  start  such  iterative  search  process. 


2.5.  FICTIOUS  PLAY  FOR  DYNAMIC  N-PERSON  GAMES 

Whenever  sensor  nodes  can  establish  effective  communication  with  each  other,  we  propose  to 
use  the  theory  of  bargaining,  well  understood  in  game  theory  community,  to  model  their 
cooperation  and  coordination  among  sensors  in  order  to  achieve  the  most  optimal  solution 
(Praeto  optimal),  which  may  not  be  sometimes  achievable  in  non-cooperative  games. 


Fictitious  Play  for  Dynamic  Game 


Figure  8:  Fictitious  Play  for  N-person  games  of  Patek,  .et  al. 

Furthermore,  for  dynamic  N-person  games  with  a  common  objective  among  the  players,  we  plan 
to  investigate  for  future  research  the  applicability  of  the  techniques  that  Prof.  Steve  Patek  and  his 
colleagues  at  the  University  of  Virginia  have  developed  for  generating  endogenous  Nash 
equilibrium,  which  is  schematically  depicted  in  Figure  8,  using  fictitious  play  learning  algorithm. 


14 


Final  Report  on  Game-Theoretic  Sensor  Resource  Management 


In  considering  optimal  architecture  for  command  and  control  of  large  system  (which  is  a  key 
issue  for  any  future  sensor  systems),  Dr.  Patek  and  his  colleagues  have  studied  the  use  of  game 
theoretic  techniques  in  distributing  the  search  for  global  extrema  in  large-scale,  discrete 
stochastic  optimization  problems.  In  their  work,  they  attribute  “player”  status  to  each  asset 
involved  in  the  search  for  a  globally  optimal  solution,  each  having  a  utility  function  equal  to  the 
objective  function  of  the  optimization  problem  being  solved.  Then,  generalizing  the  “fictitious 
play”  algorithm  of  game  theory,  they  engage  all  players  in  a  repeated  process  of  “best 
responding”  to  the  past  search  activities  of  all  other  players  along  with  their  observations  about 
what  constitutes  an  optimal  solution.  In  applying  this  algorithm  to  resource  allocation  problems 
in  a  variety  of  application  domains,  it  was  observed  that  the  algorithm  converges  in  the  sense  that 
all  players  will  eventually  converge  on  a  pure  strategy  Nash  equilibrium,  corresponding  to  a 
globally  optimal  solution  when  the  underlying  objective  function  is  convex.  Under  suitable 
regularity  assumptions,  this  result  holds  in  general,  and  we  plan  to  explore  further  the  suitability 
of  fictitious  play-based  optimization  algorithms  for  managing  sensor  assets.  Through  such 
investigation,  we  hope  to  understand  coordination  needed  among  sensors  to  respond  to  threats 
exogenous  to  the  network,  especially  for  the  purpose  of  distributed  resource  management  for 
sensors. 

2.6.  SRM  ENVIRONMENT  FOR  HIGH  LEVEL  FUSION  (LOCAL) 


Figure  9  shows  how  we  may  put  together  some  of  the  approaches  we  described  so  far  to  devise  a 
local  version  of  hierarchical  dynamic  game-theoretic  sensor  simulation  environment,  which  we 
started  in  our  RESE  effort.  We  now  describe  this  in  more  details,  making  connection  with  multi¬ 
level  date  fusion: 
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2.6.1.  From  Level  1  to  Level  2  via  Statistical  Modeling 


The  key  connection  between  level  1  and  level  2  fusions  is  an  estimation  technique  that  wc  have 
developed  to  determine  the  identity  of  a  group  of  objects,  given  the  identities  of  the  individual 
objects  themselves.  The  basic  paradigm  is  as  follows: 

•  We  are  given  a  set  of  measurements,  associated  in  a  non-redundant  manner  with  objects. 

•  Objects  are  known  to  submit  to  a  finite  number  of  distinct  and  exhaustive  classifications. 

•  Objects  are  known  to  co-occur  in  groups,  where  each  group  is  defined  uniquely  by  its 
object  composition. 


Notional  Scenario 

Aggraata-Hypo 

Pr(h) 

Pr(h|y) 

Tank  Platoon: 

0.33 

0  10 

SAM  Unit: 

0.33 

0  82 

Supply  Unit: 

0.33 

0  08. 

0 


% 

Class  p(y2^<2)  Pr(x2(y) 

Tank 

4.8 

0.84 

Truck: 

0.77 

0.07  ^ 

0  05  // 

004  <y 

Class  p(y1M)Pr0<1|y) 
Tank:  3.4  0.76 
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0  1 

2.3 

Truck:  0.04  0  1  3 
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0  76 

Radar  3.2  0.01 
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0  13 

% 

SA.M:  2.8 
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Radar:  3.2 

n 

0.01 

Class  p(y5fe<5)  Pr(x5ty) 

H 

Tank:  1.4 
Truck  0.04 
SA.M:  9,3 
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0  01 

0.01 
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0  60 

Class  p(y4fe<4)  Pr(x4|y) 
Tank:  0.4  0.01 

Truck.  0  04  0  02 

SA.M  6.8  0  95 

Radar:  12.1  0.01 

Figure  10:  Notional  scenario  for  group  and  target  classification 


Given  prior  knowledge  on  the  relative  frequency  or  probability  of  specific  object  groupings,  how 
are  we  then  to  judge  which  hypothesis  of  group  identity  best  explains  the  available  object-level 
observations?  We  must  also  consider  the  impact  of  this  prior  knowledge  upon  the  classification 
of  individual  objects.  That  is,  how  should  the  context  in  which  an  object  is  observed  (the  pres¬ 
ence  of  other  observed  objects)  affect  our  classification  of  that  object?  More  precisely,  given  the 
following: 


ol,  ...on:  the  set  of  n  objects  in  the  group  G 
jel, ..,  xq:  possible  object  types 

y  =  (yl, ..,  jm):  the  set  of  object  measurements  on  the  n  objects 
h:  a  group  type  hypothesis  on  group  G, 
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and  assuming  that  we  have  per-object  measurement  likelihoods  conditioned  on  the  target  type 
p(yi  |  xj),  we  would  like  to  (a)  evaluate  the  probability  of  various  group  hypotheses  Pr(h  |  y),  and 
(b)  evaluate  the  updated  target  probabilities  Pr(xj  |  y)  (see  Figure  10  for  a  notional  scenario).  Wc 
tackle  the  problem  by  laying  out  a  precise  probabilistic  model  for  the  occurrence  and  observation 
of  groups  of  objects.  This  model  consists  of  three  components,  namely  (i)  a  prior  model,  (ii)  a 
detection  model,  and  (iii)  a  measurement  model.  More  precisely: 


(i)  The  Prior  Model:  The  prior  model  specifies  what  groups  of  objects  may  occur  and  the 
relative  frequencies  with  which  these  groups  occur.  Group  hypotheses  are  specified  by  the  group 
composition,  i.e.,  the  number  of  instances  of  each  object  class  are  present  within  the  group. 


(ii)  The  Detection  Model:  Our  observation  of  a  group  may  be  incomplete  in  that  the  objects 
themselves  may  be  difficult  to  identify.  To  model  our  uncertainty  as  to  whether  or  not  the  “cor¬ 
rect”  objects  were  detected  we  introduce  a  detection  model,  using  the  Poisson  process  based 
upon  the  probability  of  detection,  false  alarm  rate  that  is  consistent  with  expected  clutter  density. 


(iii)  The  Measurement  Model:  The  measurement  model  specifies  the  probabilistic  distribu¬ 
tion  of  measurements  conditioned  upon  each  of  the  possible  target  classifications.  Ambiguity  as 
to  the  correct  classification  of  an  observed  object  is  created  by  overlap  of  measurement  models 
for  different  object  classes. 


Overall  Model  Structure" 

Prior  Model 

Q  Hidden  Aggregate 

Detection  Model 

Observed  Aggregate 


? 

J\ 

■n 


Joint  State  Assignment 


•ment Model  j  j  •  •• 

o  o 


Q  Marginal  State  Assignments 
Q  Marginal  Measurements 


Figure  1 1:  Model  Structure 


These  three  processes  may  be  thought  of  as  being  “piped”  in  that  the  output  of  each  feeds  into 
the  next,  as  illustrated  in  Figure  1 1 .  The  tree  diagram  shown  there  illustrates  the  top-down  struc¬ 
ture  of  our  model.  The  prior  model  specifies  the  prior  probabilities  of  the  states  at  the  root  node. 
The  Detection  Model  determines  the  transition  probabilities  from  the  hidden  aggregate  to  the  de¬ 
tected  aggregate  -  fundamental  considerations  determine  the  probabilistic  mapping  from  the  ob¬ 
served  aggregate  state  to  the  various  object  classifications.  Finally,  the  measurement  model 
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gives  the  transition  probabilities  from  an  object-classification  node  to  the  associated  measure¬ 
ment. 

The  calculation  of  likelihoods  under  this  model  is  then  summarized  as  follows.  Using  Bayes' 
rule,  the  probability  of  an  aggregate  hypothesis  (h)  conditioned  upon  all  observations  (y)  is 

vmy)-p(yWm) 

Ply ) 

where  Pr(h)  is  given  by  the  prior  model,  p(y  |  h)  is  the  likelihood  of  the  measurements  condi¬ 
tioned  upon  h,  and  p(y)  is  a  normalizer: 

P(y)  =  ^P(y\h)Vr(h) 

Using  the  detection  model,  the  mapping  from  the  aggregate  hypothesis,  h,  to  the  detected  distri¬ 
bution,  d,  may  be  computed  by  calculating  transition  probabilities,  Pr (d  \  h).  Using  these  transi¬ 
tion  probabilities  and  the  chain-rule  we  have: 

p(y  I  h)=  ypiy  I  d)  Pr (d  I  h) 

d€±Rji,q) 

where  the  summation  is  taken  over  D(n,q),  the  set  of  all  possible  distributions  of  n  objects  into  q 
classes,  where  n  is  the  number  of  detected  objects. 


Finally,  using  the  measurement  model,  we  compute  the  likelihood  of  a  set  of  observations  y  con¬ 
ditioned  upon  the  distribution  of  detected  objects,  d ,  by  direct  calculation  using  the  chain-rule  as: 

P(,  |  <0  -  Jp(y  I  am*  |  </)  -  ^ ^ > 

The  former  sum  is  over  the  set  of  all  possible  assignments  of  q  object-classifications  to  n  objects. 
The  latter  formula  indicates  the  “brute-force”  method  of  calculating  the  measurement  likelihoods 
conditioned  upon  a  distribution  hypothesis.  The  summation  is  over  all  assignments  a  consistent 
with  the  hypothesized  distribution  d.  In  general,  this  computation  could  be  very  time  consuming 
and  may  not  be  feasible.  To  avoid  that,  we  propose  to  develop  an  approximate  evaluation  proc¬ 
ess  by  collapsing  the  entire  object  classification  probability  distributions  into  one  aggregated 
“group  classification  vector”  and  skip  the  enumeration  of  an  exponentially  growing  set  in  the 
joint  state  assignment. 


Impact  of  Group  Hypotheses  on  Value  Functions  for  Sensor  Resource  Management 

Given  the  set  of  track-level  measurements  y  and  the  probability  distribution  over  all  group  hy¬ 
pothesis  {Pr(/i  |  y)}h,  we  can  construct  a  value  function  (VL 12)  for  the  group  that  incorporates 
both  level  1  and  level  2  fusion  measurements: 


£12 


(G  |  >0-  ^)F(A)Pr(A  |  y)  ^  V {oj  \  y} )Pr(xy 


It) 
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In  this  value  function,  the  first  summation  represents  the  level  2-fusion  value,  where  V(h)  is  the 
commander’s  preference  (importance)  of  a  particular  group  type.  The  second  summation  repre¬ 
sents  the  level  1  fusion  value,  where  V(oj  \  yj )  is  the  value  of  objeet  j  given  yj,  the  objeet  level 
measurement  of  object  j  (note  that  this  valuation  is  itself  determined  via  the  commander’s  prefer¬ 
ence  on  the  object  type  as  well  as  the  target  probabilities  Pr(x/ 1  y)). 


To  utilize  this  value  function  for  sensor  resource  management,  we  rely  on  sensor  models  for  gen¬ 
erating  expected  objeet  level  measurements  y,  thereby  yielding  varying  level  1  and  level  2  val¬ 
ues,  henee  varying  total  value,  for  alternative  sensors  and  sensor  modes.  This  value  function  ean 
thus  be  used  to  determine  sensor  resource  allocation  decisions  that  refleet  the  commander’s  pref¬ 
erence  in  terms  of  object  or  group  types  as  well  as  the  capabilities  of  the  various  sensors  avail¬ 
able. 

2.6.2.  From  Level  2  to  Level  3  via  Uncertain  Games 

Exploitation  of  level  1  and  level  2  (red  force)  data  tends  to  be  a  local  (in  time  and  space)  process 
that  analyzes  data  within  a  particular  field  of  view  for  a  short  period  of  time.  In  contrast,  level  3 
data  exploitation,  determining  the  intent  of  the  enemy,  tends  to  be  a  more  global  process  (longer 
time-horizon,  looking  at  the  entire  AOI).  Inferring  the  red  force  intent  is  a  difficult  problem,  es¬ 
pecially  given  that  it  ehanges  over  time  and  depends  on  the  enemy’s  perception  of  the  blue  force 
intent.  As  we  argued  thus  far,  we  have  used  one  mathematical  framework  for  making  such  rea¬ 
soning  rigorous,  namely  game  theory. 

However,  we  found  that  just  a  textbook  statie  2-person  game  in  the  spirit  of  von  Neumann 
wasn’t  quite  sufficient  for  our  research.  This  was  due  to  the  uncertainty  in  our  problem  domain 
whieh  stems  from  the  uncertainty  in  the  red  foree  identity,  which  then  has  a  direet  bearing  on  the 
red  foree  intent.  Different  units  do  have  different  strategies  and  payoff  functions,  whieh  conse¬ 
quently  define  different  games  with  different  equilibrium  solutions.  To  illustrate,  consider  the 
example  shown  in  Figure  12,  where  the  blue  foree  must  ehoose  among  five  possible  strategies 
against  a  red  force  whose  identity  is  unknown  but  is  assumed  to  be  either  a  defensive,  offensive, 
or  supply  unit.  The  red  foree  unit  would  have  different  strategy  ehoiees  depending  on  its  identity 
-  for  example,  six  possible  attack  strategies  for  an  offensive  unit,  three  counter  measures  for  a 
defensive  unit,  and  three  transport  strategies  for  a  supply  unit.  Given  the  uncertainty  of  the  red 
force  identity  and  the  assumed  possible  strategies  for  eaeh  red  foree  unit  type,  the  blue  foree 
must  ehoose  its  strategy  that  will  be  most  sueeessful. 

To  resolve  the  red  foree  identity  uncertainty  and  thus  the  identity  of  the  game,  we  have  used  the 
output  of  the  level  2  data  exploitation,  in  particular  {Pr(/?  |  y)}/,,  the  probability  distribution  over 
all  group  hypothesis  h  given  the  track-level  measurements  y.  This  is  how  our  level  2  analysis  and 
algorithms  are  related  to  those  of  level  3. 
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Figure  12:  The  blue  force  must  determine  which  strategy  to  adopt  unsure  of  the  identity  of  the  red  force 

Under  an  assumption  of  the  red  force  identity,  we  can  define  the  two  person  game  that  the  blue 
force  and  red  force  should  play,  for  which  we  know  that  a  Nash  solution  exists,  thanks  to  the 
work  of  von  Neumann  and  Nash.  Therefore,  we  first  develop  a  mapping  between  red  force  unit 
types  and  the  set  of  strategies  and  related  payoff  functions,  which  define  the  game.  As  contex¬ 
tual  information  such  as  weather,  doctrines,  and  terrain  information  will  affect  red  force  strate¬ 
gies  and  payoffs,  such  information  will  be  taken  into  account  in  this  mapping,  in  future  research. 

Next,  given  the  game  defined  by  this  mapping,  it  still  remains  to  determine  the  Nash  equilibrium 
of  the  game,  and  thus  the  intent  of  the  red  force.  This  is  in  general  a  difficult  problem.  How¬ 
ever,  in  the  special  case  of  two-person  non-zero  sum  game  as  we  have  in  this  context,  Mangasar- 
ian  and  Stone  showed  that  the  Nash  equilibrium  can  be  systematically  found  using  techniques  of 
quadratic  programming.  Further,  given  the  uncertainty  in  the  enemy’s  identity,  our  analysis  will 
result  in  a  Nash  equilibrium  solution,  i.e.,  an  intent  distribution,  for  each  of  m  defined  games, 
each  game  corresponding  to  an  assumption  (hypothesis)  on  the  red  force  identity  (m= 3  in  the  ex¬ 
ample  in  Figure  12).  Though  this  set  of  intent  distributions  can  then  be  incorporated  back  into 
our  hierarchical  value  function,  we  have  set  aside  this  for  future  research. 


2.6.3.  SW  Prototype  Architecture  for  Game  theoretic  L1/L2/L3  Fusion 

Figure  13  shows  a  prototypical  view  of  the  software  architecture  for  the  simulation  environment 
that  we  have  built  which  simulates  game  theoretic  multi-level  fusion.  L1/L2/L3  SRM  module  is 
as  we  described  already  in  2.6.1  and  2.6.2.  We  now  proceed  to  describe  the  rest  of  modules  in 
more  details. 
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(unit  decomposition, 

red  strategy) 


Dynamic  Game  Module 


(blue  strategy,  red  strategy) 


Figure  13:  Prototypical  S  W  Architecture  for  Game  Theoretic  SRM  for  High  Level  Data  Fusion 


Dynamic  Game  Module 

A  finite  game  is  defined  by  a  set  of  strategies  for  each  player  and  the  payoff  matrix,  representing 
numerically  how  valuable  each  player  views  a  particular  set  of  strategy  choices.  The  situation 
modeled  here  between  the  blue  and  red  forces  can  be  described  as  a  two-person  zero  sum  game 
(in  future  work  we  will  consider  extensions  to  non  zero  sum  games).  As  we  mentioned  in  Section 
2.6.2,  since  the  blue  force  is  uncertain  of  the  enemy  type,  we  define  a  game  between  the  blue 
force  and  each  of  the  possible  red  brigade  type.  To  simplify  the  simulation  and  ensuing  analysis, 
without  loss  of  generality,  we  will  assume  that  each  enemy  brigade  type  has  the  same  set  of 
strategies.  Namely,  we  have  (for  this  simulation): 

•  Blue  Strategies  =  {act,  wait} 

•  Red  Brigade  Types  =  (offensive  brigade,  defensive  brigade,  deceptive  brigade) 
where 

o  Offensive  Brigade:  consists  of  2  units  of  type  A. 
o  Defensive  Brigade:  consists  of  2  units  of  type  B. 
o  Deceptive  Brigade:  consists  of  1  unit  of  type  A  and  1  unit  of  type  B. 

•  Red  strategies  =  {attack,  defend,  deceive} 

•  Payoff  Matrix: 

o 
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A(ojfensive)  = 

5  -2  1 

-5  3  2 

-1  2  2 

A(  defensive)  = 

1  -1  1 

’2  -1  3 

1 

A(deceptive)  = 

-2  1  - 

2 

where  the  first  and  second  row  of  each  payoff  matrix  corresponds  to  the  blue  force  act  and  wait 
strategies,  respectively,  while  the  first,  second,  and  third  columns  correspond  to  the  red  brigade 
attack ,  defend,  and  deceive  strategies,  respectively. 

To  illustrate  further,  we  look  at  th e  A(offensive)  payoff  matrix  in  more  details: 

•  If  the  enemy  offensive  unit  is  attacking  (column  1 ),  the  blue  force  gains  5  (in 
some  measure  of  utility)  if  it  employs  its  sensors  to  act,  i.e.,  sense  the  attack  (and 
the  offensive  unit  loses  5).  However,  the  blue  force  loses  5  if  it  fails  to  act  when 
the  offensive  unit  is  attacking,  while  the  offensive  enemy  unit  gams  5  as  it  is  at¬ 
tacking  and  thereby  achieving  its  goal  without  being  sensed. 

•  If  the  enemy  offensive  unit  is  defending  (column  2),  the  blue  force  loses  2  if  it 
employs  its  sensors  to  act  in  that  (a)  the  blue  force  loses  sensor  resources  and  (b) 
it  makes  it  easier  for  the  enemy  brigade  to  determine  that  they  are  being  sensed 
(which  may  prompt  the  enemy  to  change  its  strategy  -  the  dynamics  of  the  en¬ 
emy  strategies  will  be  explained  more  later).  By  waiting,  the  blue  force  saves 
sensor  resources  and  avoids  being  observed,  gaining  3. 

•  If  the  enemy  offensive  unit  is  deceiving  (column  3),  while  there  is  some  utility 
(gain  of  1 )  in  employing  blue  force  sensors,  it  is  of  greater  utility  (gain  of  2)  for 
the  blue  force  to  wait  before  committing  sensor  resources  given  that  the  enemy  is 
trying  to  deceive. 

The  payoff  matrices  for  the  defensive  and  deceptive  red  force  brigades  are  similarly  defined. 
The  values  of  the  entries  of  all  the  payoff  matrices  are  determined  somewhat  arbitrarily  as  a 
means  for  generating  proof-of-concept  results  in  our  simulation  exercises.  A  rigorous  methodol¬ 
ogy  for  determining  the  matrix  values  would  require  a  thorough  understanding  of  the  red  and 
blue  force  interactions  and  is  itself  a  topic  of  considerable  research  and  should  be  investigated  as 
a  top  research  for  future  research.  In  our  current  framework,  however,  the  payoff  matrices  are 
treated  as  input  to  the  simulation,  and,  therefore,  different  matrices  define  different  games,  one 
example  of  which  is  described  above. 


Bayesian  Response  Generator 

Given  the  set  of  payoff  matrices  as  defined  above  and  the  blue  force  estimate  of  the  enemy  strat¬ 
egy,  represented  by  a  probability  distribution  q  (which  is  an  input  from  L2/L3  simulation  mod¬ 
ule),  where: 

q  =  (qx,<h,<h) 
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and 


qx  =  probability  that  the  red  force  is  executing  the  attack  strategy 

q2=  probability  that  the  red  force  is  executing  the  defend  strategy 

=  probability  that  the  red  force  is  executing  the  deceive  strategy, 

one  can  then  calculate  the  Bayes  Response,  i.e.,  the  best  strategy  that  blue  force  should  take.  The 
Bayes  response  is  the  mixed  strategy  p  in  the  space  of  mixed  strategies  X*  that  achieves  the  fol¬ 
lowing  maximum: 

n 

ma =maXp6X‘  PT  A(i' 

where  u,y  represents  the  (i,  j)th  clement  of  the  payoff  matrix  (a  mixed  strategy  is  a  probability  dis¬ 
tribution  over  the  strategy  space,  e.g.,  a  2-dimensional  probability  vector  over  the  two  blue  force 
strategies  in  our  case;  a  pure  strategy  is  the  special  case  of  a  unitary  mixed  strategy).  As  we  are 
uncertain  as  to  the  enemy  type  (offensive  brigade,  defensive  brigade,  deceptive  brigade),  we 
compute  the  Bayes  response  for  each  of  the  three  payoff  matrices  corresponding  to  the  enemy 
type  given  the  enemy  strategy  q  as  follows: 

•  Bayes _p(offensive)  =  argmax(p)  pCX-  pT A{offensive)q 

•  Bayes _p(defensive)  =  argmax(p)  .  pT A{def  ensivif  <; 

•  Bayes _p(deceptive)  =  argmax(p)  .  pT A(deceptive)q 

Then  we  define  the  best  response  (Bayes  response),  Bayes _p,  as  follows: 

Bayes _p  =  Pr(offensive)*Bayes  _p(oJfensive)  +  Pr(defensive)*Bay’es _p(defensive)  + 
Pr(deceptive)  *Bayes _p(deceptive) 

where 

•  Pr(offensive)  =  probability  that  red  is  an  offensive  brigade,  i.e.  both  units  are  of 
type  A 

•  Pr(defensive)  =  probability  that  red  is  a  defensive  brigade,  i.e.  both  units  arc  of 
type  B 

•  Pr(deceptive)  =  probability  that  red  is  a  deceptive  brigade,  i.e.,  1  unit  each  of 
types  A  and  B. 

These  probabilities  over  brigade  types  are  computed  via  the  individual  unit  probabilities  of  the 
units  forming  each  brigade,  the  output  of  the  level  1 /level  2  SRM  module. 

Game  Solver 

While  the  Bayes  response  provides  the  best  response  that  blue  should  take  given  the  current  es¬ 
timate  q  of  the  red  force  strategies,  the  red  force  may  change  its  strategy  assuming  that  the  blue 
will  adopt  a  Bayes  response.  Of  course,  anticipating  such  a  change  from  the  red,  the  blue  may 
change  strategies  again,  thereby  prompting  a  red  strategy  change,  and  so  on.  Therefore,  each 
side  must  determine  the  best  strategy  to  adopt  without  knowledge  of  the  opposing  side’s  strategy 


23 


Final  Report  on  Game-Theoretic  Sensor  Resource  Management 


choice.  Assuming  that  each  side  acts  rationally,  the  Nash  solution  provides  the  strategies  that 
each  side  is  most  likely  to  adopt.  For  each  of  the  three  games  defined  above,  the  Nash  equilib¬ 
rium  is  determined  as  follows: 


For 


A(ojfensive)  = 
p(offensive)  = 


5  -2  1 
-5  3  2 


For  A(defensive)  = 

•  p(defensive)  = 
For  A(deceptive)  = 

•  p{deceptive )  = 


_8_  1_ 
15  15 


,  q{offensive )  = 


■  i  0 

3  3 


-12  2 
1  -1  1 


1  I 

2  2 

2-13 
-2  1  -2 


,  q(defensive )  = 


I  2  0 

3  3 


2  3 
5  5 


,  q(deceptive)  = 


3  2 
5  5 


We  then  define  Game _p  and  Game_q,  the  weighted  Nash  equilibrium  over  the  three  payoff  ma¬ 
trices  corresponding  to  the  enemy  brigade  types,  analogously  as  the  Bayes  response: 

•  Game _p  =  Pr(offensive)  *p(offensive)  +  Pr(defensive)*p(defensive)  + 

Pr(deceptive)  *p(deceptive) 

•  Game_q  =Pr(offensive)*q(offensive)  +  Pr(defensive)*q(defensive)  + 

Pr(deceptive)  *q(deceptive) 

We  next  combine  the  Bayes  Response  and  the  weighted  Nash  solution  as  follows: 

•  Output _  p=  tx  Bayes _  p+  (1  -  f)x  Game_  p 


•  Output _q  =  tx.  q+(\- t)x  Game _q , 

where  the  parameter  t  is  the  measure  of  how  much  one  wants  to  emphasize  the  current  estimate 
of  the  enemy’s  strategy  as  opposed  to  the  Nash  equilibrium  solution  (this  is  akin  to  the  usual 
technique  in  linear  filtering  theory,  where  (Bayes _p,  Bayes  q)  is  an  observation  and  (Game _p, 
Game_q)  is  a  prediction).  (Output _p.  Output _q),  a  pair  of  mixed  strategies,  is  then  sent  to  L2/L3 
Simulation  Module  which  we  now  discuss. 


L2/L3  Simulation  Module 

The  Dynamic  Game  Module  provides  Output _p,  the  blue  force  next  sensor  decision  (action  or  no 
action),  and  Output _q,  a  probabilistic  assessment  of  the  enemy  strategy.  The  L2/L3  Simulation 
Module  utilizes  these  output  strategies  as  follows: 
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Using  the  Blue  Strategy. 

The  blue  force  strategy  (act  or  wait)  is  determined  via  the  generation  of  a  random  variable 
against  the  output  probability  vector  Output _p.  In  the  case  of  action,  we  utilize  the  level  I  and 
Level  II  valuation  function  to  select  a  sensor  mode  as  described  in  2.6. 1 .  The  current  track  qual¬ 
ity  state  used  in  the  level  II  valuation  function  for  SRM  includes  a  24  Joint  Kinematic  and  Clas¬ 
sification  (JKC)  state  Markov  model.  These  JKC  states  discretize  the  tracking  and  classification 
quality  into  a  finite  number  of  possibilities  so  that  one  could  relate  the  sensor  management  deci¬ 
sion  to  the  valuation  function  in  a  compact  and  computationally  efficient  manner.  In  the  case  of 
no  action,  we  extended  our  simulation  algorithms  to  evolve  the  JKC  state  under  the  assumption 
of  no  look. 

Using  the.  Red_  Strategy. 

The  assessed  enemy  strategy  Output_q  is  used  to  predict  the  enemy  strategy  for  the  next  time 
step  via  a  two-step  process.  First,  we  model  the  evolution  of  the  enemy  strategy  from  one  time 
step  to  the  next  in  the  Strategy  Dynamic  Model  module,  and  then  we  update  the  strategy  prob¬ 
ability  based  on  analyst  assessment  given  sensor  observation  in  the  Analyst  Reasoning  module 
and  Strategy  Dynamic  Module. 

Strategy  Transition  Model  Module:  We  utilize  a  Markov  transition  model  to  represent  the  evolu¬ 
tion  of  the  enemy  strategy,  and  thus  this  module  models  how  adversarial  strategy  transitions 
when  the  adversary  is  confident  that  it  is  not  being  watched.  We  used  a  linear  map  (transition 
matrix)  for  starters  and  will  investigate  more  complicated  model  in  future  research.  Example  of 
this  linear  map  where  the  adversary  employs  three  strategies  (i.e.  sl=Attack;  s2=Defend; 
s3=Deceive),  is  given  below: 


0.95 

0.25 

0.25' 

A*'1  = 

.025 

.95 

.025 

.025 

.025 

.95 

The  three  rows  and  columns  represent  the  three  enemy  strategies  (attack,  defend,  deceive).  To 
illustrate,  the  first  row  indicates  that  if  the  enemy  is  currently  attacking,  at  the  next  time  step,  the 
enemy  will  attack  with  probability  .95,  and  defend  or  deceive  with  probability  .025  respectively. 

Analyst  Reasoning  Module:  Wc  utilized  a  confusion  matrix  as  a  function  of  sensor  mode  to 
model  the  analyst  assessment  (a  more  elaborate  model  can  be  developed  in  the  future).  For  in¬ 
stance,  GMTI  mode  has  a  better  ability  to  detect  moving  targets  and  therefore  the  analyst  can 
better  identify  the  enemy  strategy  with  the  observation.  The  confusion  matrices  are  used  to  gen¬ 
erate  random  hypothetical  observations  based  on  true  enemy  strategy.  The  confusion  matrices 
for  GMTI  and  HRR  currently  utilized  are  as  follows: 


(.1  .1  .2\ 

L 5  .2  ,3\ 

O  '(GMTI)  = 

.2  .4  .4 

O  '(HRR)  = 

.2  .6  .2 

(.4  .2  .4, 

l-4  1 
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To  illustrate,  the  first  row  of  the  GMTI  confusion  matrix  indicates  than  if  the  enemy  is  attacking, 
an  analyst  receiving  a  GMTI  report  will  identify  the  enemy  strategy  as  attacking  with  probability 
.7,  defending  with  probability  .1,  and  deceiving  with  probability  .2. 

We  utilize  the  confusion  matrix  corresponding  to  the  sensor  mode  chosen  by  the  blue  force  under 
the  blue  force  action  strategy  decision.  To  determine  the  correct  row  of  the  given  confusion  ma¬ 
trix,  we  utilized  enemy  ground  truth  via  the  Enemy  Dynamic  Model. 

Enemy  Dynamic  Module 

This  module  models  by  a  Markov  process  how  an  adversary  may  adapt  strategy  when  detecting 
surveillance  from  the  ballistic  missile  defense  system.  An  example  is  pictorially  depicted  below: 


Figure  14:  Markov  Chain  for  Enemy  Dynamic  Module 


2.6.4.  Simulation  and  Results 

We  conduct  a  number  of  Monte  Carlo  simulations  where  the  initial  enemy  strategy  was  selected 
either  randomly  or  based  on  the  enemy  unit  composition.  In  each  trial,  we  conduct  100  time 
steps  where  a  dynamic  sensor  decision  is  made  at  each  step.  In  the  simulation,  we  try  three  sce¬ 
narios  where  each  scenario  consists  of  different  composition  of  units  (AA,  AB,  and  BB).  We 
assume  that  the  value  of  the  two  unit  types  can  be  chosen  based  on  the  commander’s  preference. 
Similarly,  the  value  of  each  target  types  can  also  be  assigned  accordingly. 

In  the  simulation,  we  run  50  Monte  Carlo  trials  for  each  scenario.  In  each  scenario  the  window 
size  for  sensor  action  is  set  to  be  10  and  the  decision  threshold  is  set  differently  in  each  test.  For 
example,  with  threshold  equal  to  40%,  Figures  15-1  thru  15-3  show  the  results  of  a  typical  trial. 
Figure  15-1  shows  that  about  36%  of  the  time  sensor  is  off  (mode  #3)  and  the  rest  sensor  is  on 
(mode#l  for  GMTI,  mode#2  for  HRR  on  unit  1,  and  mode  #2  for  HRR  on  unit  2).  Figure  15-2 
shows  that  in  this  trial,  the  enemy’s  strategy  has  changed  from  1  (offense)  to  2  (defense),  then 
back  to  1  and  then  later  move  to  3  (deceptive),  and  back  to  1.  Figure  15-2  also  shows  that  about 
44%  of  the  time,  the  most  likely  strategy  of  our  assessment  is  the  true  one.  Figure  15-3  shows 
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the  assessed  probability  of  the  three  strategies  as  well  as  the  assessed  probability  of  the  true  strat¬ 
egy. 


In  each  test,  we  also  compare  the  game  solver  with  heuristic  algorithm  where  the  sensor  ac- 
tion/no-action  decision  was  assigned  based  on  a  pre-specified  probability.  We  have  found  out  in 
general,  the  performance  of  heuristic  solver  is  significantly  worse  than  the  one  with  game  solver. 
This  is  understandable  since  the  enemy  strategy  changes  frequently  due  to  our  sensor  actions.  It 
is  much  more  difficult  to  assess  enemy’s  strategy  without  an  interactive  game  theoretic  strategy 
analyzer. 


Figures  15-4  thru  15-9  shows  the  overall  performance  comparison  between  the  heuristic  ap¬ 
proach  and  the  game  solver  approach.  In  this  test,  we  set  the  size  of  time  window  for  enemy 
strategy  policy  to  be  10  and  the  decision  threshold  for  change  is  80%  (i.e.,  our  sensor  will  need 
to  be  on  greater  than  80%  of  time  in  the  preceding  time  window  of  size  10  before  the  enemy  will 
potentially  change  strategy).  Note  that  in  the  figures,  we  display  both  the  mean  and  the  one- 
sigma  confidence  interval  for  both  probability  of  correct  decision  (Pcd)  and  probability  of  correct 
classification  (Pcc). 


For  example.  Figure  15-4  compares  the  average  game  solver  performance  with  the  average  heu¬ 
ristic  solver  performance.  The  x-axis  represents  the  probability  of  sensor  action  for  the  heuristic 
approach.  Note  that  in  the  Figure,  the  performance  is  obtained  by  averaging  over  different  cases 
where  each  case  represents  a  specific  level-1  and  level-2  value  combination.  Figure  15-5,  on  the 
other  hand,  shows  the  performance  of  a  specific  case  (case  13,  where  the  unit  type  1  (A)  and  tar¬ 
get  type  3  are  the  emphasis)  under  the  same  scenario. 


Similarly,  Figures  15-6  and  15-7  show  the  results  for  scenario  BB  and  Figures  15-8  and  15-9 
show  the  performance  of  scenario  AB.  It  is  clear  that  in  all  cases,  the  game  solver  outperforms 
the  heuristic  solver  most  of  the  time.  We  then  change  the  policy  decision  threshold  from  80%  to 
101%  and  test  the  performance.  The  corresponding  performance  results  are  shown  in  Figures 
15-10  thru  15-15.  Note  that  the  fact  that  the  threshold  is  greater  than  100%  indicates  that  the  en¬ 
emy  strategy  will  not  change  throughout  the  simulation.  In  these  cases,  the  heuristic  strategy 
performs  almost  linearly  proportion  to  the  sensor  action  rate.  This  is  intuitive  since  no  correla¬ 
tion  between  the  enemy  strategy  and  our  SRM  decision  and  therefore  more  sensor  action  imply 
more  sensor  data  and  better  performance.  It  is  interesting  to  see  that  the  game  solver  performs 
approximately  the  same  as  the  heuristic  solver  at  around  70%  sensor  action. 

We  also  change  the  policy  decision  threshold  to  40%  and  compare  the  performance.  The  results 
are  given  in  Figures  15-16  tO  15-21.  The  resulting  performance  curves  are  somewhat  similar  to 
that  of  Figures  5-9  thru  5-15  except  in  these  cases,  the  heuristic  solver  performs  better  in  terms  of 
Pcd  when  the  sensor  action  rate  is  small.  Also,  the  overall  performance  is  worse  with  40% 
threshold  than  that  of  101%  threshold.  Note  that  with  40%  threshold,  the  enemy  strategy 
changes  much  more  rapidly  and  therefore  is  much  harder  to  identify. 
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Sensor  Mode 


Finally,  to  test  the  robustness  of  the  game  solver,  we  modify  the  game  solver  ratio  from  0.5/0. 5 
to  0.9/0. 1.  This  ratio  is  to  balance  between  long  term  and  short  term  prediction/assessment  of 
enemy’s  strategy.  The  results  are  shown  in  Figures  15-22  thru  15-27.  They  are  very  similar  to 
Figure  15-4  thru  15-8  which  demonstrates  the  robustness  of  the  algorithm. 


(set  below  stands  for  strategy  change  threshold) 


Sensor  Mode 


Figure  15-1:  Sensor  Mode  Decision 
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Figure  15-2:  Enemy  Strategy 


Performance  Companion  -  Averege  of  all  Cases/AA 


Figure  15-4:  AA  (set  =  .80/ fixed  initial  strategy  (1)) 
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Performance  Compariaon  -  Cast  13/AA  Performance  Companson  •  Average  of  «H  Cases/B8 


Figure  15-5:  AA(13),  sct=0.80/fixed initial str.  (1)  Figure  15-6: 88,  sct=0.80/fixed initial str.  (2) 


Performance  Comparison  ■  Case  21/B0  Performance  Companson  •  Average  of  all  Cases/AB 


Figure  15-7:  Case  21 /BB,sct=0.80/fixed  initial  str.(2) 


Figure  15-8:  AB,  sct=  80  and  fixed  initial  str.  (2) 


Performance  Companson  -  Case  21 /AB 


Performance  Companson  -  Average  of  all  Cases/AA 


Figure  15-9:  Case  21/ AB,  sct=  0.80/fixed  initial  str. 


Figure  15-10:  AA  sct=  1.01  /fixed  initial  str.  (1) 
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Performance  Companson  -  Case  13/AA  Performance  Companson  *  Average  of  all  Ca$es/BB 


Figure  15-11:  Case  13/AA  sct=  1.01  /fixed initial str.(1) 


Figure  15-12:  BB,  sct=1.01/fixed  initial  str.  (2) 


Parformence  Companson  *  Case  21/BB 


Performance  Companson  Average  of  all  Casas/AB 


Figure  15-14:  AB,  sct=:  1.01/fixed  initial  str.  (2) 


Performance  Comparison  ♦  Case  21 /A B 


Parformence  Companson  •  Average  of  all  Cases/AA 


Figure  15-15:  Case  21/AB,  sct=1.01,  fixed  initial  str.(2) 


Figure  15-16:  AA,  sct=0.40/fixed  initial  str.(1) 
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Pcc  or  Pcd 


Porformance  Comparison  ■  Case  13/AA  Performance  Companeon  -  Average  of  all  Cases/BB 


Performance  Companeon  •  Case  21/BB 


Performance  Companson  -  Average  of  all  Casas/AB 


Figure  15-20:  sct=  0.40  and  fixed  initial  str.(2 ) 


Performance  Companson  -  Caec  21/AB 


Performance  Companson  •  Average  of  all  Cases/AA 


Figure  15-21:  Case  21/AB,  sct=:  0.40/  fixed  initial  str.(2)  Fi9ure  15'22:  AA’  9ame  solver  ratio  0.9/0.1 
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Pcc  of  Pcd 


Performance  Comparison  •  Case  13/AA 


Performance  Companson  *  Average  of  all  Cases/BB 


Figure  15-24: 88,  game  solver  ratio  0.9/0. 1 


Performance  Companson  •  Case  21/BB  Performance  Companson  -  Average  of  all  Cases/AB 


P*rformanct  Comparison  Ctst  21/AB 


Figure  15-27:  Case  21/AB  with  new  game  solver  ratio  0.9/0. 1 
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2.7.  SRM  ENVIRONMENT  FOR  HIGH  LEVEL  FUSION  (GLOBAL) 
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Figure  16:  Going  from  local  to  Global 


Figure  16  depicts  how  a  local  game  theoretic  SRM-Tracker  pair  (one  akin  to  Figure  14  above) 
may  be  put  together  with  other  such  pairs  in  a  distributive  and  decentralized  architecture.  The 
key  distinction  between  the  local  picture  that  was  discussed  in  the  previous  section  (2.6)  and  the 
global  picture  in  Figure  16  is  the  presence  of  common  global  objective  that  was  absent  and  does 
not  arise  in  the  local  setting.  In  other  words,  as  each  local  SRM-Tracker  pair  tries  its  best  to 
maintain  its  local  picture  of  situational  awareness,  there  exists  a  global  objective  that  these  pairs 
are  supposed  to  achieve  together.  Is  it  always  the  case  that  the  decisions  that  Tracker-Sensor  Pair 
has  to  make  to  optimize  its  local  objective  function  coincide  with  the  decisions  that  it  needs  to 
make  to  contribute  maximally  to  the  global  objective?  This  is  often  the  case,  but  unfortunately, 
not  always  true.  It  turns  out  that  there  is  tension  between  the  local  decisions  (which  are  results 
(i.e.  Nash  equilibrium)  of  solving  local  2-person  game)  and  the  global  decisions  (which  are  re¬ 
sults  (i.e.  Nash  equilibrium)  of  solving  global  N-person  game),  and  each  SRM-Tracker  pair  must 
be  able  to  choose  optimally  in  order  not  to  neglect  one  in  favor  of  the  other.  Exactly  how  to  bal¬ 
ance  these  two  competing  needs  is  still  very  much  an  area  of  open  research  (even  in  the  fields  of 
economics  where  game  theory  plays  a  vital  role),  and  a  very  difficult  problem,  which  involves 
solving  N-person  games,  which  itself  is  a  difficult  problem,  as  we  explained  in  previous  sections. 
We  made  some  modicum  of  progress  through  this  research  addressing  this  problem,  which  we 
explain  now. 


2.7.1.  Uncertain  Global  Adversarial  Identity 


The  fact  that  there  is  an  uncertainty  regarding  the  identity  of  the  global  adversary  should  be  of  no 
surprise,  as  there  is  an  uncertainty  in  the  identity  of  the  local  adversaries  (which  make  up  the 
global  adversary)  as  discussed  in  section  2.6.  As  such,  we  tackle  this  issue  in  the  similar  way,  as 
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we  did  for  the  uncertain  games  that  arose  in  the  local  setting.  In  other  words,  we  will  take  the 
probability  distribution  over  all  possible  global  adversary  types  and  use  that  to  define  the  uncer¬ 
tain  game  that  the  global  blue  force  is  engaged  in  against  the  global  red  force.  To  be  more  pre¬ 
cise,  since  we  had  the  following  for  the  local  case  (as  defined  in  section  2.6) 

•  Local  adversary  types  for  each  TSP  =  {Offensive,  Defensive,  Deceptive}; 

If  the  number  of  TSP(Tracker-Sensor-Pair)’s  =  3  for  the  global  case,  then  we  have: 

•  Global  adversary  types  =  {Offensive,  Defensive,  Deceptive} 
where 

•  Global  offensive  adversary  consists  of  at  least  two  local  offensive  adversaries 

•  Global  defensive  adversary  consists  of  at  least  two  local  defensive  adversaries 

•  Global  deceptive  adversary  consists  of  at  least  two  local  deceptive  adversaries 
Defined  as  above,  number  of  each  global  adversarial  type  is  7  as 

•  7  =  1  (3  of  a  kind)  +  C(3,2)  (2  of  a  kind)  x  2  (remaining  2  choices) 

This  still  leaves  cases  where  each  TSP  is  of  a  different  type.  There  are  6  (=  3!)  of  them  and  they 
will  be  classified  as  deceptive  type.  Therefore,  we  have  in  the  end  for  the  global  adversary  types: 

•  7  global  offensive  adversary  consisting  of  at  least  two  local  offensive  adversaries 

•  7  global  defensive  adversary  consists  of  at  least  two  local  defensive  adversaries 

•  13  global  deceptive  adversary  consists  of  at  least  two  local  deceptive  adversaries  or  of  3 
different  local  adversary  types 

And  the  global  adversary  type  at  a  given  point  will  be  a  probability  distribution  over  these  three 
types  calculated  from  the  probability  distribution  of  each  of  the  3  local  adversary  types. 


2.7.2.  Solving  the  global  2-person  Game  (Common  Objective  Generator) 


As  the  identity  of  the  global  adversary  is  being  determined,  so  the  payoff  matrix  for  the  2-person 
game  that  the  blue  and  the  red  are  engaged  in  is  also  being  determined.  We  show  one  particular 
payoff  matrix  we  use  extensively  as  follows: 


global_payoff  matrix  = 


1  2 
2  -1 
-1  4 


-1 

4 

-3 


As  this  is  the  convention,  the  rows  represent  the  strategy  space  of  the  blue  (i.e.  the  sensor  net¬ 
work  of  three  TSP’s)  and  the  columns  represent  the  strategy  space  of  the  global  adversary,  with 
the  following  strategies: 
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•  Global  Blue  Strategies  =  {Sense  A  Lot  (SA),  Sense  Little  (SL),  Don’t  Sense  (DS)} 

•  Global  Red  Strategies  =  {Attack  (AT),  Defend  (DE),  Deceive  (DC)} 

In  other  words,  global  red  adversary  can  employ  the  overall  strategy/objective  of  “attack”,  “de¬ 
fend”  and  “deceive”  strategy  which  it  will  try  to  achieve  using  its  local  agents  to  achieve.  In  re¬ 
sponse,  sensor  network  can  employ  the  overall  strategy/objective  of  “Sense  A  Lot”  which  will 
then  be  communicated  to  the  individual  sensors  in  its  network,  (exactly  how  this  global  objective 
will  be  translated  into  a  local  strategy  is  a  topic  of  the  next  section),  and  the  same  goes  for  the 
global  strategy  /objective  of  “Sense  Little”,  and  finally  for  “Don’t  Sense”.  Their  relative  measure 
of  success  &  failure  employing  different  strategies  are  reflected  in  the  following  table,  which  co¬ 
incides  with  the  payoff  matrix  above: 


Attack  (AT) 

Defend  (DE) 

Deceive  (DC) 

Sense  A  Lot  (SA) 

1 

2 

-1 

Sense  Medium  (SM) 

2 

-1 

4 

Sense  Little  (SL) 

-1 

4 

-3 

For  example,  when  the  global  adversary  is  in  the  attack  mode,  if  the  sensor  system  is  not  em¬ 
ploying  its  distributed  sensors  to  monitor  what  the  adversary  is  doing  (i.e.  DS  strategy),  the  sen¬ 
sor  system  loses  its  measure  of  success  (-1),  and  the  adversary  gains  the  measure  of  success  (1). 
To  the  contrary,  if  the  adversary  is  in  the  defensive  mode,  it  is  to  the  sensor  network’s  advantage 
to  not  sense  (DS),  so  as  to  not  waste  its  sensing  resources,  as  the  adversary  is  in  the  defensive 
mode,  and  not  at  the  current  time  doing  something  that  is  worthy  of  being  sensed. 

The  Nash  equilibrium  for  this  pay  off  matrix  is  as  follows  (using  the  techniques  we  described 
about  2-person  games): 

Nash  blue  =  [  —  -  -I 

U  2  4/ 

Nash  red  =  I  —  —  — 

U  2  4/ 

Now  in  order  to  estimate  the  strategy  or  the  intent  of  the  global  adversary,  we  then  use  the  tech¬ 
niques  that  we  developed  in  the  local  dynamic  2-person  game  situation  to  solve  such  games  as 
above  as  in  the  following  linear  combination: 

global_red_strategy[r  +1]  =  0.2  •  global_Nash_red[/  +  1]  + 

0.2  •  global_red_strategy[r]  + 

0.2  •  local_red_strategy_p  1  [  /  ]  + 

0.2  •  local_red_strategy_p2[ /]  + 

0.2  •  1  oc  a  l_red_st  rate  gy_p3[  t  ] 
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In  other  words,  in  order  to  estimate  the  current  global  strategy,  we  combine  the  global  red  strat¬ 
egy  of  the  previous  time  epoch,  current  observation  of  the  adversarial  global  strategy  seen  from 
the  current  three  local  sensors  and  the  current  global  Nash  equilibrium  (as  a  baseline  estimate). 

However,  there  are  a  few  assumptions  that  we  made  in  writing  down  the  above,  which  we  must 
pursue  in  further  research.  They  are: 

1 .  Even  though  we  have  used  the  equal  waiting  on  all  five  components  that  contribute  to  the 
current  global  strategies,  other  weightings  are  certainly  possible  and  we  will  explore 
more  in  future  research. 


2.  We  have  made  an  assumption  that  the  strategy  space  of  the  local  adversaries  as  well  as 
that  of  the  global  adversary  are  all  equal  to  each  other,  in  order  to  map  the  local  strategy 
space  to  global  strategy  space.  Such  assumption  is  rather  strong,  indeed,  and  in  future  re¬ 
search,  we  will  explore  exactly  what  the  inverse  (or  pseudo-inverse)  mapvE"1  of 


V :  A' 


global 


A', 


local 


3.  We  have  assume  that  the  transition  map: 
cb'  •  a'  ->  A'+l 

^  ^  local  /Y  local 

is  a  simple  identity  map.  However,  this  is  too  simplistic  of  an  assumption  and  a  simple 
linear  map  using  a  transition  map,  akin  to  what  was  done  in  local  case,  should  be  consid- 


0.9 

0.1 

0.1 

ered  (e.g.  O  = 

0.05 

0.8 

0.15  ) 

0 

0.2 

0.8 

Similarly,  for  the  blue,  we  proceed  as  we  did  before  in  the  local  case,  i.e.,  linear  combination  of 
Bayesian  response,  best  possible  blue  response  if  our  estimate  of  the  opponent  is  a  perfect  esti¬ 
mate,  and  Nash  response,  the  equilibrium  point,  that  is  considered  as  a  safe  response  given  the 
opaqueness  to  the  current  adversarial  strategy,  as  follows: 


global_bl  ue_strategy  [  t  +  1]  =  0.5  •  blue_global_Nash_strategy[f  +  1]  + 

0.5  •  blue_global_Bayesian_Response[r  +1] 


2.7.3.  N-person  Game  for  Sensor  Network  Coordinator 

Now  once  we  have  a  common  objective  (namely  “Sense  A  Lot  (SA)”,  “Sense  Little  (SL)”,  or 
“Don’t  Sense  (DS)”)  come  out  from  the  Common  Objective  Generator,  as  we  just  described  in 
the  previous  section,  we  then  have  to  push  down  this  global  common  objective  to  the  local  level 
and  have  it  be  distributed  among  TSP’s  (Tracker-Sensor  Pair).  Such  push-down  and  distribution 
of  the  global  objective  can  happen  in  two  ways: 

1.  If  the  communication  among  the  TSP’s  are  robust),  it  can  happen  cooperatively  among 
the  TSP’s  and  one  way  that  this  can  be  implemented  is  via  the  process  of  finding  endoge- 
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nous  equilibrium  thru  cooperative  fictitious  game  structure  of  Patek,  et.  al.,  as  was  out¬ 
lined  section  2.5 

2.  If  the  communication  among  TSP’s  is  limited  or  compromised,  each  of  the  common  ob¬ 
jective  can  then  be  translated  into  a  non-cooperative  game  which  then  each  TSP  can  then 
play  against  other  TSP’s  to  ensure  the  global  objective  is  met  via  their  competition,  rather 
than  cooperation. 

In  the  current  research,  we  chose  the  method  #2  via  Uncertain  Game  Maker,  though  #  1  is  cer¬ 
tainly  worthy  of  further  investigation  and  future  research. 


2.7.4.  Uncertain  Game  Maker 

We  first  let 

i  =  argmax(blue_global_strategy(y)) 

Is  j* 3 

Now  if  / =  1  (i.e.  global  objective  =  “Sense  A  Lot  (SA)”),  we  have  the  following  3-person  matrix 
game:  (recall,  as  is  convention,  rows  represent  strategy  of  first  player  (PI),  and  columns  repre¬ 
sent  strategies  of  the  second  player  (P2).  Also  recall  that  locally  each  sensing  agent  has  two 
strategies:  {Sense}  and  {Not  Sense}): 


P3  =  {Sense} 

Sense 

Not  Sense 

Sense 

3 

2 

Not  Sense 

2 

1 

P3  =  {Not  Sense} 

Sense 

Not  Sense 

Sense 

2 

i 

Not  Sense 

1 

0 

Nash  solution  for  this  matrix  game,  using  the  principle  of  dominance  is  (1,  1,  1)  as  the  pure  Nash 
equilibrium  strategy. 

Now,  if  /  —  2  (i.e.  the  global  objective  =  “Sense  Medium  (SM)”),  we  then  have: 


P3  =  {Sense} 

Sense 

Not  Sense 

Sense 

3 

2 

Not  Sense 

2 

3 
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P3  =  (Not  Sense} 

Sense 

Not  Sense 

Sense 

2 

3 

Not  Sense 

3 

0 

In  this  case,  there  is  no  pure  Nash  solution  for  this  payoff  matrix,  which  makes  things  more  in¬ 
teresting.  Also,  notice  there  is  payoff  of  3  for  (1,  1,  1).  This  reflects  a  bias  towards  coordination 
to  achieve  the  common  objective  of  “Sensing  A  Little  (SM)”. 


Finally,  if  /  =  3  (i.e.  the  global  objective  =  “Sense  Little  (SL)”),  we  then  have: 


P3  =  {Sense} 

Sense 

Not  Sense 

Sense 

1 

1 

Not  Sense 

1 

2 

P3  =  {Not  Sense} 

Sense 

Not  Sense 

Sense 

1 

2 

Not  Sense 

2 

1 

Nash  solutions  for  this  matrix  game  arc  {(2,  2,  1),  (2,1,2),  (1,2,  2)}  as  pure  Nash  equilibriums 
and 

(2,1,2)  +  "(1,2,2)}  as  a  mixed  Nash  solution. 

Now  whichever  the  global  objective  may  be  (thus  whichever  game  wc  may  be  solving),  the  local 
strategy  supporting  the  global  objective  will  then  be  pushed  down  to  each  local  sensing  agent 
(TSP)  and  it  is  a  (new)  job/capability  of  each  TSP  to  decide  what  to  do  with  this  new  globally- 
inspired  objective. 


2.7.5.  Balancing  Global  and  Local  Needs 

The  figure  below  (Figure  17)  captures  pictorially  the  decision  that  each  TSP  needs  to  make.  Lo¬ 
cally,  at  a  given  time,  using  the  local  L3  module  that  was  described  in  section  2.6,  each  TSP 
needs  to  decide  whether  to  activate  its  sensor  or  not  (Level  3  decision)  which  then  employs  its 
Level  1  /  Level  2  SRM  algorithms  to  determine  whether  to  use  MTI  mode  or  HRR  mode.  Fur¬ 
thermore,  it  also  receives  from  Global  Sensor  Network  Coordinator,  as  described  in  sections 
2.7.4  and  2.7.5,  a  decision  whether  to  sense  or  not  to  sense,  so  as  to  help  achieve  the  global  ob¬ 
jective  in  concert  with  other  TSP’s. 
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Currently,  for  starters,  we  are  using  the  equal  balance  between  the  two.  In  other  words,  we  have 


global  blue  strategy 
against 


>■ 

Intent/ 

Mission  Objectives 


Sense  or  NOT  sense? 
Sometimes,  even  If  we 
have  the  resource,  It's 
better  NOT  to  sense  - 
for  local  reason  or  for 
a  global  objective 


Intent 


Figure  17:  Balancing  between  local  and  global 


_\_  \_ 

stsp  ~  2  s,ocal  +  2  Sf;lnhal 

In  future,  a  more  effective  balancing  algorithm  will  be  employed.  The  one  currently  envisioned 
takes  into  account  the  size  of  uncertainty  that  grows  or  shrinks  over  time  at  the  global  level  and 
also  at  the  local  level.  In  other  words,  we  could  have: 

k,  k2 

TSI  s/'ze(local  uncertainty)  UKul  s/'ze(global  uncertainty)  s‘obal 

2.7.6.  Putting  It  Altogether 

We  now  put  all  these  components  together  and  run  this  hierarchical  simulation  loop  as  described 
in  the  following  pseudo-code: 
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^hierarchical  distributed  game  theoretic  simulation  algorithm> 


for  (each  simulation  step) 

%  First  play  three  separate  local  2-person  games 
for  (each  player) 

Dynamic  2-person  game_solver_for  local  sensing  strategy 
If  (sense) 

activate_sensor  using  SRM  Module 
Evolve  enemy  strategy  using  Strategy  Transition  Module 
Monitor  enemy  strategy  using  Analyst  Reasoning  Module 
Update  enemy  strategy  using  Enemy  Dynamics  Module 

end 

end 

%  Now  play  global  2-person  game 

Calculate  next  global  strategy  via  CommonObjectiveGenerator  using 
Previous  global  strategy  and  previous  local  strategies. 

Find  next  global  n-person  game  via  UncertainGameMaker  using 
current  global  strategy  and  local  adversary  types 
Find  next  globally  inspired  local  strategies  by  solving  the  current  n-person 
game  via  n_person_game_solver 

%  Final  Balancing  act 

Compute  new  sensing  strategies  by  balancing  local  sensor  strategies 
and  globally  Inspired  local  sensing  strategies. 


end; 


And  these  algorithms  are  pictorially  captured  in  the  Figure  18  below: 
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Figure  18:  Final  Simulation  Architecture 
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